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I. INTRODUCTION 



A common problem in analysis of experiments or of Monte Carlo simulations is fitting 
a parameterized model to the average over a number of samples of correlated data values. 
In particular, lattice QCD calculations typically require fitting operator correlators, which 
are a function of distance between the operators, to sums of exponentials with unknown 
amplitudes and masses. If the number of samples is not infinite, estimates of the variance 
of the parameters ( "error bars" ) and of the goodness of fit are affected. This can be viewed 
as a generalization of the well known rule "replace N by N — 1 in the denominator" in 
calculating the error on an average to the case where the error is on a parameter estimated 
by a fit to correlated data points. We calculate approximate corrections to the variance 
of the parameters (see Fig. [1] for a graphical example) for estimates made in the standard 
way from derivatives of the parameters' probability distribution as well as from jackknife 
and bootstrap estimates. (The distribution of parameter estimates is not exactly Gaussian, 
so the variance of the parameters is not quite the whole story.) Without compensating 
for sample size effects, none of these methods give unbiased estimates of the parameters' 
variance. 

Many numerical simulation programs or experiments involve two or more stages of fit- 
ting, where the parameters resulting from the first stage are the data input to the second 
stage. For example, in computations of meson decay constants in lattice QCD the first stage 
involves fitting a correlator of meson operators to exponentials and extracting the mass and 
amplitude, and the second stage involves fitting these masses and amplitudes to functions 
of the quark masses and lattice spacings to allow extrapolation to the chiral and continuum 

n n, n 

limits. (See for example Refs. |l| and [2|.) For example, in Ref. jJLf] about 600 hadron masses 
and amplitudes are computed in the first stage of fitting, and these 600 numbers and their 
(co)variances are in turn the data for the fitting in the second stage. While an unbiased 
estimate of the variance of the parameters is always welcome, it is particularly important 
in this case since many parameters in the first stage of fitting are used as inputs (data) 
in the second stage of fitting, and if their errors are systematically too large or too small 
the apparent goodness of fit in the second stage of fitting will be very good or very bad 
respectively. 
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II. THE PROBLEM 



We consider a problem where we need to fit a function of P parameters to an average of 
N samples, where each sample consists of D data points. We use subscript indices to label 
the component of the data vectors and superscript indices to label the samples. Thus x" 
is the i'th component of the a'th sample, with < i < D and < a < N. Each sample 
is assumed to be normally distributed, but the different components of the D dimensional 
sample are generally correlated. Averages over samples will be denoted by overbars. We will 
need to imagine averaging over many trials of the experiment, and we will use angle brackets 
to denote such an average: (xTi ). 

So, for example 

a 

~ N j 

a 

a b 



The covariance matrix ( "of the mean" ) for one trial is 

Q 3 = ^(xix- -xi xj) = ^$>^ - ^ fez?) fe*$) (2) 

This covariance matrix will fluctuate around the true covariance matrix, obtainable only in 
the limit iV — > oo. Note we use instead of in normalizing C^. For our purposes, the 
difference between these normalizations is best included with the other order -h effects to be 
discussed. 

Fit parameters p a , with < a < P, are obtained by minimizing 

X 2 = \ Xi -x{(p a )j {C' 1 ).. (x] -xfaa)) (3) 

. , . _ n 

where x i {p a ) is the value of x\ predicted by the model. As pointed out in Ref. [3j, since 
we are stuck with estimates of the covariance matrix and the Tl obtained from the same 
samples, they are correlated. 
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First change to a convenient coordinate system (alas, available only in theory, not in 
practice). For the moment we assume that our fit model is good, so that the x[(p a ) can be 
adjusted to equal the true averages of the Xj. Shift the coordinates so that (xl ) is zero. Then 
rotate the coordinates so that the true covariance matrix is diagonal, and rescale them so 
that («,»> = 1. (So fat, we have Mowed Re, fl.) We now have <^> = M*. and the 
true covariance matrix is the unit matrix. 

Make a further rotation so that the changes in the x{(p a ) as the p a vary around their 
true values are in the first P components, and so that the changes in the x{(p a ) as the first 
parameter p varies are in the first component. Now we can rescale po so that = 1, which 
simply means that po is the average x~o . In doing this we have assumed that po is linear 
enough in the ~x[ or that the fluctuations in the $i £1X6 small enough. 

In this basis, write the covariance matrix (from the data in this experiment) and its 
inverse in blocks, 



C 



c- 1 




(4) 



where the matrices U and A are P by P, V and B are P by D — P and W and E are D — P 
by D - P. 

Now x 2 is given by 

\ 2 (- .rf) (C '), ; (~ •'•]') , (5) 
where only the first P components of x{ are nonzero. For example, with two parameters 



2 / — / — / — 

x = [ Xl -x{,x 2 -x J 2 ,x 3 




^ X\ — x{ ^ 

X 2 ■-^'2 



(6) 



The x{ are found from minimizing % 2 : 

dx 2 







dxL 



(7) 
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where here and in many subsequent equations starred indices run from to P — 1 and primed 
indices from P to D — 1, and the factor of two comes from differentiating with respect to 
the x{ on both sides of Eq. [5] and using the fact that C~ x is symmetric. 
This is solved by 

x{» = x^ + A~} jt Bj* k ^ (8) 

From C C' 1 = 1 and Eq. gj 

UA + VB T = 1 
UB + VE = 
V T A + WB T = 
V T B + WE = 1 (9) 



Using the third of Eqs. [91 remembering that A and W are symmetric, we get an alternate 
to Eq. E 

B = -AVW- 1 
A~ l B = -VW~ l 

4 = xF - VrjWjlxu (10) 

From this equation we see that, in this basis, parameter number zero, Xq, does not depend 
on the other of the first P components, Tj* with 1 < i* < P. Thus the distribution of 
parameters depends only on the combination D — P = d. 
Similarly for % 2 : 



/ A B \ 




( -A~ l Bx \ 


Wei 


II 


l x ) 



X 2 = {-xB T A-\x) I „ J I I (11) 

= x7 (-B T A- l B + E) 
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Now insert W 1 W = 1 and use the third and fourth equations in [9] 

X 2 = x W~ l (-WB T A- 1 B + WE) x 
= x W~ l (V T AA- l B + WE) x 
= x W~ l (V T B + WE) x 

= x^ W^xjr (12) 

But W is just the covariance matrix for the last D — P components of x in this basis, so 
the statistical properties of x 2 are exactly the same as a D — P dimensional problem with 
no fit parameters, and the distribution of x 2 , as expected, depends only on the number of 
degrees of freedom, d = D — P. We note that the distribution of x 2 (more properly, T 2 ) is 
known. Since it is important here and closely related to the estimates of parameter errors, 
we quote the result in Appendix I. 

III. NUMERICAL EXAMPLE 

To illustrate the effects of sample size, we begin with a numerical example, using the 
basis described above. In this example, N Gaussian distributed random data vectors with 
D = 25 were generated. The data was fit with P = 5 parameters, which are just the first 
P components of the average data vector. This was repeated for many trials. The black 
octagons in Fig. CD show iV times the variance (over trials) of one of the parameters, where 
the asymptotic value is one. These black octagons are the correct answer for the variance of 
the parameter, and this is the variance that we wish to estimate from our experiment, where 
we only have one trial to work with. We see that for finite N the parameters fluctuate by 
an amount larger than the asymptotic value. 

We also show the average over trials of the variance estimated from derivatives of the 
parameter probability, the average over trials of the variance estimated from a single elimi- 
nation jackknife analysis, and the average from a bootstrap analysis. For the jackknife and 
bootstrap, the plot contains average variances both for the case where the full sample covari- 
ance matrix was used in each resampling and where a new covariance matrix was made using 
the data in each jackknife or bootstrap sample. Red squares are average variances from the 
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FIG. 1: Number of samples times the variance (square of the error) of a parameter in fitting 
correlated data, and averages over trials of several methods for estimating this variance. The 
horizontal line indicates the asymptotic value, variance(xQ ) = 1/N. N is the number of samples; 
the meaning of the plot symbols is described in the text. 
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usual "derivative" method. Blue diamonds are from a single elimination jackknife analysis 
where a new covariance matrix was made for each jackknife sample. The two blue bursts 
(on top of the red squares) use the full sample covariance matrix in each jackknife resample. 
Similarly, the green fancy plusses are from a bootstrap analysis, using the covariance matrix 
from the original sample. The green crosses are estimates from a bootstrap analysis where 
a new covariance matrix was made for each bootstrap sample. 

We see that correct answer deviates from the asymptotic value for finite N, and that the 
various methods for estimating this variance produce biased estimates of the variance of the 
parameter. 

IV. LARGE N EXPANSION 

Most of the effects shown in Fig. [1] can be understood analytically. We can expand the 
covariance matrix in each trial around its true value, 

Cij = — {Sij + [XiXj 5{j X{ Xj )} (13) 

Here the term in parentheses has fluctuations of order 1/y/N and an average of order jr. 
Thus its square will also have expectation value ~ j?. 
Then 

- N( oc ^ oc j 5 t j oo ^ oo j j 

+ N (xix^ - 5 ik - X% Xk ) yOC^Xj dfzj Xk Xj j 

+... (14) 

Using the fact that integrals of polynomials weighted by Gaussians are found by pairing 
the xl in all possible ways, or making all possible contractions, we can develop rules for 
calculating these expectation values. We will use parentheses to list the pairings. For 
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example, with (12) indicating that the first and second x are paired, 

x- (12)(34) 



+ Xi XiXj Xj (13) (24) 

+ xl x~x~x~ (14)(23) (15) 



Using 
and 

a 

we get the Feynman rules for contractions of barred quantities. 

1. Each contraction gives a 8ij for the lower indices it connects. 

2. Each bar gives a j±, whether it covers a single x or two, Wl or xix] — see EqJT] 

3. Each continuous line made of overbars and contraction symbols gives a factor of 
N. This is from the J] ab S ab S bc . . ., which has N nonzero terms. For example, 
x~i XjXk xl (12) (34) is one continuous line, while xi x]Xk xj (14) (23) is two lines (one 
is a loop). This results in every loop giving an extra factor of iV relative to other 
contractions with the same number of fields. 

Since an open line (not a loop) with C contractions has 2C x's and C + 1 bars, but a 
loop with iV contractions has 2C x's and C bars, these rules can be rephrased as: 

1. Each contraction gives a 5ij for the lower indices it connects. 

2. Each x gives a factor of 1/yN. 

3. Each loop gives a factor of N. 

In the expansion of C^ 1 we find the combination which we will denote 

by Xjcc] . This occurs frequently enough that we should state special rules for it. 
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In evaluating an expression containing X{Xj there will be contractions where the Xi and 
Xj 111 XiXj £1X6 contracted with each other. These contractions just cancel the 5ij. The terms 
with xi and x] contracted give a —Sij/N. Thus a "tadpole" where xix] (12) contracts with 
itself just gives a —Sij/N. (This includes the iV -1 / 2 from each of the x 's.) 

Now consider terms where X-^Xj IS part of an open line, like 

xlxWj x] (12)(34) (16) 

In this case the xix] and the — xi x] cancel, so xix] can never be part of an open line. But 
if this object is part of a loop, like in 

f^f^(13)(24) (17) 

the xix] part is part of the loop, but the — x] x] part breaks the loop. Thus the four paths 
hidden in these double bars give 

N-2 + l = (N-l)8 ik 8 jl (18) 

Similarly, a loop of three double bars gives 2 3 = 8 terms, N — 3 + 3 — 1 — (N — 1), and any 
loop made up entirely of Tix] 's gives a factor of — 1 times the appropriate Kronecker 5's. 
As trivial examples, 

(xl x] ) = xi x] (12) = ^Sij (19) 
(xix] ) = x~ix] (12) = 5ij (20) 



{Ndj) = 1 + xix] (12) = 




We are also interested in the variances of averaged quantities. For the variance of some- 
thing, var (X) = (X 2 ) - (X) 2 , we need the "connected part" of (X 2 ). We use a vertical bar 
to denote this, and we only need contractions where some of the lines cross the bar. 
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As an example, for the variance of an arbitrary element of C to lowest order, 



(var (Qj)) 



(pij) (Cij) no sum ij 



1 — \~ X^Xj 



N 2 

N- 1 

TV 4 
1 2 



1 I *ijC i^OC j 



NS 



(22) 
(23) 



x~x] (13)(24) + (14)(23) 



NS 



3 



iV 2 N 1 1 



(24) 



The last line is written to display that the fractional variance on the diagonal element is j*. 

In equations where the components are separated into starred indices, < %* < P and 
primed indices, P < i' < D, contractions of primed with starred indices are zero, contractions 
of starred with starred indices give delta functions with 5i*i* = P, and primed with primed 
use Si'ii = D — P. 

V. VARIANCE (AND HIGHER MOMENTS) OF THE PARAMETERS 

In this section we examine the variances of the parameters - that is, the error bars on 
our answers. First we calculate how much the parameters actually vary over many trials of 
the experiment. Then we calculate the average of common ways of estimating this variance 
- from derivatives of the probability, from an "eliminate J" jackknife analysis or from a 
bootstrap resampling (using either the covariance matrix from the full sample, or a new 
covariance matrix made from each jackknife or bootstrap resample). The differences allow 
us to find and correct for bias in our error estimates resulting from the finite sample size. 

For the actual variance of our parameters, use Eq. [101 Since we are in a coordinate system 
where the average of this quantity is zero, we don't need to worry about taking the connected 
part. 



11 



+ 2 (x^ VojW^xu > (25) 

Since V and W are made entirely of double bars and can therefore only be part of a loop, 
and primed indices can't contract with index zero, the middle term (cross term) is zero. 
We compute this to order 

] x l x o ) = (^o ) 

^ X j' (Pj'rn' XjiX m > ~\~ Xj'Xp' Xp'X m ' . . .) (^X m iXQ ) 

{x X n > ) (<W - Xn'Xk' + X n >X T > X r <X k ' . . .) X^ ^ (26) 

The leading term, (xo Xo ), is just 
The term with six x's has only one contraction: 



xy xj'Xq x x k > x k > (16) (25) (34) 



where d = D — P. 

There are two equal terms with eight x's. There are three nonzero contractions of this 
term. Ignoring the N~ 4 parts, these are 

_ _2 

- 2x]7 Xj>x m > x m ix x^x^ x^ (18) (27) (34) (56) = -j^5ji k i5j> k >5 m i m >5m 

_ __ _ _l-2 

-2S~7 XfX m > x m 'X x x k ' x^ (18)(23)(47)(56) = j^S jfk >5 j/m/ 5 m > k >5 o 

-2 



-2xj> Xj>x m > x m >x x x k > x k > (18) (24) (37) (56) = j^5 jlk >5 rm i5 m > k >5 m (28) 



Here the plus sign on the second contraction comes from the tadpole. The second and third 
contractions cancel, so we just have 

12 



To order we only need two loop contractions from the terms with ten x's. There are 
three such terms, but two of them are equal. 

i^Xj' Xj'X m ' X m 'XQ XqX u ' X n 'Xy Xy ^ 
+2 (Wf XjiXpi X p rX m r X m 'Xo XQXk' Xy ) (30) 



Each term has two contractions: 



Xy Xj'X m ' X m 'XQ XQX n ' X n 'Xy Xy (1,10)(28)(39)(47)(56) 



+ Xy Xj'X m ' X m 'Xo X X n i X n 'Xy Xy (1, 10) (29) (38) (47) (56) 

= js( d + d2 ) (31) 



Z Xji Xj'Xpi Xp'X m ' X m 'XQ XQXk' Xy (1,10)(24)(35)(69)(78) 



+ 2 xy xj'Xpf x p 'X m i x m /x XQX k ' xy (1, 10) (25) (34) (69) (78) 

= ^- 3 ( d + d2 ) (32) 



Putting it all together, 



1 | d | d ^ d + 2 ) | (33 ) 
~ N + N* + N* + • • • [66) 



Thus the fluctuations in the parameters are larger than the asymptotic value j^. from the 
covariance matrix. 

As noted above, the probability distribution of the parameters is not exactly Gaussian. 
Higher moments of this distribution can be obtained in the same way. At leading order in 
there is only one independent diagram for the connected part of each moment, and we find, 
for M even, 

/ connected 
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VI. ESTIMATES OF THE PARAMETERS' VARIANCE 



In practice, the most common method for estimating the variance of the parameters is to 
use the covariance matrix for the parameters. (See, for example, Ref. [5|.) In our coordinate 
system, this matrix is just A' 1 , and our estimate for the variance of parameter zero is 
(v4 _1 ) 00 . Using the third and first of Eqs. [91 

B T = -W~ l V T A 

1 = \JA-VW~ X V T A 
A' 1 = U -VW' X V T (35) 



Then, our estimate for the variance of parameter zero is 

V(1t{Xq) derivative = Aqq 

— Uoo — Vok'W kl l,ViF 
= (S 00 + xp^ ) 

- — (w ) {h'V ~ XVXV + Xk'Xm' Xm'XV ■ ■ •) xF^o (36) 



For the order jt correction we only need the Sw from W x , and find 



Nvar(xl) denvaUve = 5 00 + m (12) - fp? (14)(23) 

1 N-l /n 



+ (37) 



The order 1/N 2 contribution to this estimate vanishes, as sketched in Appendix II. If D = 
P = 1 this is just jr (l + xo^o ) = 1 — 4, the standard correction for a simple average, 
reflecting our normalization of the covariance matrix. Comparing to the desired result in 
Eq. [331 we see that this is an underestimate of the variance of the parameters. The difference 
between this error estimate and the correct one above is that this estimate assumes that the 
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covariance matrix remains fixed while the data points vary, while the correct answer takes 
into account the correlations between the data points and the covariance matrix (constructed 
from these same data points). 



VII. VARIANCE OF JACKKNIFE AND BOOTSTRAP PARAMETERS 

The variance of the parameters is also often estimated by a jackknife or bootstrap analysis. 
In these methods the fit is repeated many times using subsets of the data sample, and the 
variance of the parameters is estimated from the variance over the jackknife or bootstrap 
samples. Both the jackknife and bootstrap can be done either using the covariance matrix 
from the full sample in fitting each jackknife or bootstrap sample, or by remaking a covariance 
matrix for each resample. Using the full sample covariance matrix amounts to seeing how the 
parameters vary with fixed covariance matrix, that is, by varying x^ and Ty in Eq. [8] with 
A~[}^Bj*y held fixed. This is the same question as is answered by var(xl) derivative in Eq. [37J 
Since the change in the parameters is linear in x~j* and ~xy , it doesn't matter if the xi are 
varied infinitesimally (by taking derivatives) or slightly (jackknife) or fully (bootstrap). In 
this case, the variance of the parameters will have the same bias as does Eq. [37J — no new 
calculation is necessary, although there is a slight difference due to the normalization of the 
covariance matrix used here. 

Remaking the covariance matrix for each resample includes correlations of the covariance 
matrix and data, but not in quite the desired way. The calculations above can be extended 
to calculate the expectation value of the parameter variance for the jackknife analysis in 
which the covariance matrix is recomputed for each jackknife sample. An "eliminate J" 
jackknife consists of making N/J resamples, each omitting J data vectors (numbers nJ 
through (n + 1) J — 1), and hence having Nj = N — J elements. We will denote averages 
in the n'th jackknife sample with a superscript (n). The average of x a in the n'th jackknife 
sample is 



where J data vectors (starting with number nJ) were deleted from the full sample. The 




(38) 
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variance of this quantity (over the jackknife samples) is 

J 



(39) 



N(N - J) ' 

so we generally multiply the variance over the jackknife samples by to get the expected 
variance of the mean jr. 

We now compute the variance of the parameters in the jackknife fits. In doing this we will 
need averages of products of quantities from different jackknife ensembles. Without losing 
generality, we can think of these as ensembles number zero and one, which differ only in their 
first J data elements. Thus, expectation values of sums over values in different ensembles 
may produce factors of Nj — J instead of Nj, where Nj is the number of samples in the 
jackknife, and is really N — J. (Nj — J = N — 2 J is the number of samples in common 
between two different jackknife resamples.) 

For example, using (n) to denote quantities in the n'th jackknife sample (xj is the j'th 
component of the a'th data vector in jackknife sample {n)), for n ^ m, 



<4 n) 4 m) ) 



J ab 



2 u jk 



(40) 



where we define 5 ab = 1 if a = b and a, b e (J,N— 1), otherwise. Thus the sum over a 
and b gives a factor of Nj — J instead of Nj. 
From Eq. [10], parameter in jackknife fit (n) is 



xf = 4 n) +V$Wj$ n >xP (41) 



and the variance of this parameter over the jackknife samples is 

J 



varj(xl) 




m 

V®WfF»W V>rl^f )) ) («) 



V 
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This is more complicated than Eq. [25] because the mean over jackknife samples is not 
exactly zero. Also, the sums over sample vectors now sometimes give Nj, sometimes Nj — 1, 
and sometimes Nj — J, so some of the shortcuts developed above won't work any more. 
Note the J/N is correct - there are N/J jackknife resamples, each containing Nj = N — J 
elements. 

In Eq. H2] the sums contain terms where n = m and terms where n ^ m. Separate the 
diagonal and off-diagonal terms in the sums, and use the fact that all non-diagonal terms 
are equal, Ylim contains N/ J — 1 terms with m ^ n, and ^2 mp has N/ J diagonal terms and 
(N/J)(N/J - 1) off diagonal: 



varj(x f ) 



1 - — I r 



-I 1 ' 
- 2(l 

+ 2 ^1 

+ fi- 



r (n) _(n) 
J \ (n) (m) 



J 



(n) v (n) w -l(n) (n) 
; x v 0i' vv i'k' x k' 



^ \ (n) j r(iTi) -rj T — l(m) (m) 



j 



™(«) TT/- 1 ( n )T/ T HT/( n )T / r/- 1 (")„W 



J 



( 1 — I <t» H7-l(")v' r ('')i/Hw-l("') T W 



(43) 



where n ^ m. To evaluate this expression we need: 



w 

(&) 
(c) 
(d) 

(e) 
(/) 



(4 



(n 



(4 



(fl 



<4" 



a; 



(n) 



) 



T M \ _, 

/n^tm 



v 0«' Kl Vfc' X fc' / 

T/( m )w- 1 H T M \ 

T/f/ _1(n) T/ T(n) T/ (n) T/T/" 1(n) -r (n) \ 
^j'i' 'VO K 0j' ^j'fc' x fc' / 

W -Hn) v T{n) v {m) w -l(m) Am) v 



(44) 



Here (a), (c) and (e), which involve only jackknife sample (n), are the same as in the previous 
section with the replacement of N by Nj. Because W -1 ^ and V^ n > consist only of double 
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barred quantities which can't be part of an open line, and primed and unprimed indices 
can't contract, (c) vanishes. For (d) we can imagine expanding all the TiX] ("^'s into pieces, 
(TiX] ( m ) — 5ij — Wi ^x] (m )). and making all contractions. There is only one factor of x 
from jackknife sample (n), which must contract with something from (m). Thus, all of these 
terms differ from (c) by replacement of exactly one factor of Nj by Nj — J, and therefore 
also sum to zero. 

Similarly (e) and (/) differ by the replacement of one or more factors of Nj by Nj — J. 
Thus, their difference will be one order in less than their value. This means that to get 
the first correction to the asymptotic form, we need only keep the lowest order term in part 
(e), and the analogous contraction for part (/). 



(a) (r (n) r (n) \-J_V r a(n) r {n) - — 
\ a ) \ X X I — pr2 / X ~ AT 

J ab 3 

(h) (Jri \ - — V T a{n) T b(m) - Nj ~ J 

\ u ) \ X x / ~ jw2 / v X X — at2 

J ab J 



(e) 


/ (n) (n) 


(n) 
7c ; 


4"M 






(n) (n) 
— j' 3 


(n) 


(n) 
X k' 


(16)(25)(34) 




N J ~ 1 , „ 

= , (D 
N 3 j { 








(/) 


i ( n ) ( n ) 


(m) 


J m ) ' 

X k' , 


> 




(n) (n) 

dj jf Jjj'JjQ 


(m) 


(m) 


(16)(25)(34) 




Nj-2J- 






P) 









(45) 



Here (a), (c) and (e) are the same as in the previous section. To evaluate (/) we need 
to separate the two terms in the double overbar (the delta function isn't there since the 
indices can never be equal), since they may give different numbers of factors of Nj — J. This 
evaluation proceeds as: 
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/ r\ ^, / (rt) In) (m) (m) \ 

I T I ' / ry * ' <-y» , <-y» » / ry _ ry v ' ry v ' \ 

UJ ~ \ X j> X j' X X X k' X k' In 



(n) in) (m) (m) 

ry v ' ry , ry *■ ' ry ry v ' ry v ' 



(n) (n) (n) (m) (m) 

<-y» v ' ry N ' ry N ' /y n If* T* 



(n) (n) (m) (m) (m) 



X,-, Xj'Xq Xq X fc , X fc , 



. (n) (n) (n) (m) (m) (m) 
T" X\v ^-i/ x*q Xq x k' X k r 



_ _j / «( n )~K") Hn) <m) d(m) /(m) 

~~ jy6 \ j' i' fc/ fc/ 

abcdef 

a(n) b(n) b(n) d(m) e(rre) f(m) 

— X-i X-i x x x k , x k , 

a(n) b(n) c(n) d(m) d(m) f(m) 

— X-i X-i Xq Xq x k , x k , 

I a(n) b(n) c(n) d{m) e(m) / (m) ' 

"T X^/ X^-/ Xq Xq x k / x k i 



Now the (16) (25) (34) contraction gives 

J abcdef V 



bd 



8« k ,8 af S,k'S be S nn S bd 



Jj'k'V Vj'k'V <w 

- 5j> k >6 af 5j>k'S M <W 
+ <W5 a/ <W5 be 5 00 5 cd 



= ^ (JVj (Nj - J) 2 - 2iV, (Nj - J) 2 + (TV, _ J) 3 ) (D - P) 
= ± (JV*-2JV5J-N5 + ...)(I>-P) 
= ^ (iV J -2J-l + ...)(P-P) 

Putting the pieces together, the variance of the parameter over the jackknife sample 



NJ \N 2 j iV 3 

N-J\f J \ r 2(D-P) 

• ' 1 + -^-r- T — - + • • • 



N J \(N -J) 2 J \ N 
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Comparing with Eq. [391 which is for D = P, we see that there is an extra factor of 
1 + 2<yD ^ pS) (independent of J). However, by comparison with Eq. [33] we see that this effect 
is too large by a factor of two, so the jackknife variance for the parameters is also biased. 

The leading corrections to the bootstrap estimate of the parameters' variance can be done 
in a similar way. To be specific, our bootstrap procedure is to make B resamplings, each 
made by choosing N data vectors with replacement from the original set of N vectors, and 
calculate the variance of the parameters over the bootstrap resamples. Similarly to Eq. 
the average over trials of the bootstrap estimate of the variance is 



var B (x' ) = ( U-> -4' , ^'" , ^ W -|£(4'" ) W#*Vff*] 

\ m 



which, after separating diagonal and off-diagonal terms in the sums, becomes 




(n) (n) 
x 



(n) (m) 

ry v ' / >"» v ' 

x x 



zx v 0i> vv i>j> x k' 



i o~W uHw-lH Am) 

' ZX V Qi> VV i'j' X k' 

i J n ) T / T/- 1 ( ri )T/ T(n h/ (n) M/~ 1(?l) 'r (n; 

1- Aji Wj'ii Vj, V j, Wj, k , x k , 



X j' VV j'i' V i'0 V 0j' VV j'k' X k> I \ 0L ) 



The overall (l — is the expected factor for difference between the average over the original 
sample and average over bootstraps. Label the parts as in Eq. HH where now (n) means the 
ra'th bootstrap resample. 
For part (a), 

(^^ y ) = ^E(-o {n) 4 (n) ) (52) 

ab 

where, in this section, the superscript a(ri) means the number of the data vector in the 
original set that was chosen to be the a'th member of bootstrap resample (n). For example, 
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if for N = 3 our bootstrap ensemble members were members 0, 1 and of the original 
ensemble, then 0(n) = 0, l(n) = 1 and 2(n) = 0. We will get contributions with nonvanishing 
expectation value when a{n) = b(n). If a member of the original ensemble is chosen m times 
in the bootstrap sample, then there will be m 2 contributions. Thus the total is the sum over 
all members of the original ensemble of the square of the number of times that member was 
chosen for this bootstrap sample. The probability distribution for the number of times a 
member appears in the bootstrap sample is a binomial distribution with probability p = 1/N. 
The average square of the number of times a member appears in a bootstrap resample is 
just the second moment of this distribution, etc. 

<(*)> = 1 
<("*) 2 > "2-1 

<(^) 3 > = = 5 -^ + 4 (N>2) (53) 



Thus the expectation value of (a) is j^N (2 — = -j| — 

Part (b) is the expectation value of the number of times a member was chosen in bootstrap 
resample (n) times the number of times it was chosen in resample (m). These two are 
independent, so we get just the product of the averages, or 

For part (c), break the double bar into its two components. 

(c) = -2 (a* (">a=5a* (n) *7 (n) > + 2 (a* ^x~ V>xj ^ W) 

= ^ E (4 n) 4 n) *r4 n) } + ^E (4 n) 4 n) ^4 n) ) (54) 

abc abed 

In the first term we get a contribution when a(n) = b(n) = c(n). For each of the iV members 
of the original ensemble we therefore get n z a terms, where n a is the number of times that 
member appeared in the bootstrap resample, so we get N (5 — . . .) (D — P), where the D — P 
is from the implicit sum over %' . In the second term we get contributions when a(n) = b(n) 
and c(n) = d{n). The probabilities of these two conditions are not quite independent, since 
if one member of the original ensemble is chosen multiple times in the bootstrap resample 
the other members will be chosen fewer times. This effect will be suppressed by a power of 
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■h, so to leading order we just have {n 2 ) 2 = AN 2 (D — P). Putting in the two and overall 
factors of iV from the left, (c) = ~ 2 ^i P ^ + . . .. 

Parts (d), (e) and / are done similarly, where to this order in we only need the loop 
contraction in parts (e) and (/). 

Putting it together 

var B (4) = (l " (l + £ ^ 1 ) (55) 

VIII. CORRECTING SMALL BIASES 

Once the biases in the various estimates of the error on the parameter have been calcu- 
lated, it is a simple matter to correct for them. In particular, we should multiply variance 
estimates from the derivative method by Fd eriv in Eq. [56j Note this assumes the covariance 
matrix was normalized as in Eq. [2j For the jackknife or bootstrap done with the full sample 
covariance matrix, multiply the variance by F reuse . This differs from Fa- eriv only in the 1 in 
the denominator, the well known correction for the difference between the sample average 
and the true average, which was not included in our normalization of C. For the jackknife 
or bootstrap analysis where a new covariance matrix is made for each jackknife or bootstrap 
sample, multiply the variance by F jackknif ^ remake or F bootstraPtremake . Of course, if you are 
rescaling error bars instead of the variance, you should use the square root of the factor 
below. (In F bootstraP)remake we assumed that the in Eq. [51] has already been accounted 
for.) 

1 + £ (D - P) + £ (D - P) (D - P + 2) . . . 



deriv 



F 



l-i(l+D-P) + & 
l + ±(D-P) + ^(D-P)(D-P + 2)... 



F jackknife, remake 1 \D P ) . . . 

Fbootstrap,remake 1 "J^ ■ ■ ■ (^6) 
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FIG. 2: Numerical results from FigJTJ together with the order and results from the previous 
section. The meaning of the symbols is the same as in Fig. [TJ 

IX. COMPARISON TO NUMERICAL RESULTS 

In Fig. [2] we plot the order forms for the variance of the parameter and the various 
methods of estimating it together with the numerical data. The horizontal axis has been 
inverted to 4. Figure [3] shows the same data, with the estimates for the variance corrected for 
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FIG. 3: Numerical results from FigQ] corrected for bias up to corrections of order for the 
jackknife with remade covariance matrices and order for the methods with fixed covariance 
matrix. 

bias (up to errors of order -^3 or -^). Here the lines for the actual variance of the parameter 
(black) and for the derivative or resampling with the full sample covariance matrix (red) are 
second order in while the line for the jackknife with remade covariance matrices (blue) 
is only first order in j,. As an aside, we note that although the lowest order corrections for 
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the bootstrap with remade covariance matrices are smaller than for the other methods, the 
next order corrections appear to be larger. 
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Appendix I 

Since estimating the goodness of fit is as important as estimating the errors on the pa- 
rameters, we quote some results here. Note that what we call x 2 (with the covariance matrix 
estimated from our data) is more properly called T 2 , but we stick with the common usage 
in the lattice gauge community. 

The probability distribution for y 2 is known {(]]. In terms of N and d, 

N- d ' 2 T(N/2) (d _ 2)/2 ^ , 1^ 2 Y N/2 



Prob(y 2 ) = K ' ' (^)^-2j/2 1 + _ 2 s 57) 

UJ r(d/2)r((JV-d)/2) \ N J 1 1 

We can compare to the \ 2 distribution: 

Prob( X 2 )=C(x 2 ) {d ' 2)/2 e~ x2/2 

and see that in the limit of large they are the same. 

From moments of Eq. \57\ we see that the mean and variance of x 2 depend on the sample 
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size. Using 

N J T(N/2) 

\ X I I(D,N) N ~°- 2 l-^ K ' 

I(D + 4,N) _ iV 2 f (gg) _ D(D + 2) 
{[X)) I{D,N) (^)(*=f>=4) (1-^)(1-^) 1 J 

(Note this is using our normalization of the covariance matrix). 

Taking the connected part, or variance, and expanding in j^, this is 

var( X >) = 2d(l + (61) 

Estimates of confidence levels, or probability (over trials) that \ 2 would exceed the value 
in your experiment, can be found by integrating Eq. [571 



Appendix II 

The customary estimate for the variance of the parameters, or from jackknife or bootstrap 
resamplings with the covariance matrix held fixed, has zero coefficient at the next order. 

NvCLT^Xq^) derivative = ^00 

= Uqq — VQ k iW k ,lVy Q 

= {Soo + XpO ) 

- "77 (xo~X^ ) (8 k i V - XyXv + X k iX m i X m iX V . . .) xjx^ (62) 
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Again, we only need two loop contractions from the terms with eight x's. 

N varest(po) = 5oo (63) 



+ x x (12) 



x x k ' x k 'X (14) (23) 



+ xtfc^ x k >x v xJTx^ (16) (23) (45) 
+ x&c^ x k >xv x]Tx^ (16) (24) (35) 
+ x^ x^xj xjTx^ (16)(25)(34) 



x x k > x k >x v x v x m > x m 'X (18) (27) (35) (46) 



- x x k > x k >x v x v x m > x m >x (18) (27) (36) (45) 

= 1 -L ( l + D-P) + ± (64) 



If D — P — 1 this is just jj (l + X\Xi ) = 1 — as it must be. 
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